import pandas as pd
import seaborn as sns
import plotly.express as px
import matplotlib.pyplot as plt
import plotly.io as pio
pio.renderers.default = "plotly_mimetype+notebook"
For this excercise, we have written the following code to load the stock dataset built into plotly express.
stocks = px.data.stocks()
stocks.head()
| date | GOOG | AAPL | AMZN | FB | NFLX | MSFT | |
|---|---|---|---|---|---|---|---|
| 0 | 2018-01-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
| 1 | 2018-01-08 | 1.018172 | 1.011943 | 1.061881 | 0.959968 | 1.053526 | 1.015988 |
| 2 | 2018-01-15 | 1.032008 | 1.019771 | 1.053240 | 0.970243 | 1.049860 | 1.020524 |
| 3 | 2018-01-22 | 1.066783 | 0.980057 | 1.140676 | 1.016858 | 1.307681 | 1.066561 |
| 4 | 2018-01-29 | 1.008773 | 0.917143 | 1.163374 | 1.018357 | 1.273537 | 1.040708 |
Select a stock and create a suitable plot for it. Make sure the plot is readable with relevant information, such as date, values.
goog_stocks = stocks[['date','GOOG']]
x1 = goog_stocks['date']
y1 = goog_stocks['GOOG']
fig1, ax1 = plt.subplots(figsize=(15,9))
ax1.plot(x1,y1)
# Set title
ax1.set_title('Google stock')
# Label the horizontal axis
xticks1 = range(0, stocks.shape[0], 10)
ax1.set_xticks(xticks1)
ax1.set_xlabel('Date')
# Label the vertical axis
ax1.set_ylabel('Stock Value')
plt.show()
You've already plot data from one stock. It is possible to plot multiples of them to support comparison.
To highlight different lines, customise line styles, markers, colors and include a legend to the plot.
fig2, ax2 = plt.subplots(figsize=(15,9))
# Extract values to use for the horizontal axis
x2 = stocks['date']
# Plot each set of y values - the stock values of each stock
ax2.plot(x2, stocks['GOOG'], label = "GOOG", linestyle='-', marker='^')
ax2.plot(x2, stocks['AAPL'], label = "AAPL", linestyle='--', marker='.')
ax2.plot(x2, stocks['AMZN'], label = "AMZN", linestyle=':', marker='>')
ax2.plot(x2, stocks['FB'], label = "FB", linestyle='-.', marker='p')
ax2.plot(x2, stocks['NFLX'], label = "NFLX", linestyle='--', marker='o')
ax2.plot(x2, stocks['MSFT'], label = "MSFT", linestyle=':', marker='*')
# Set title
ax2.set_title('Stocks')
# Label the horizontal axis
# Use the x-axis ticks created in Question 1
ax2.set_xticks(xticks1)
ax2.set_xlabel('Date')
# Label the vertical axis
ax2.set_ylabel('Stock Value')
plt.legend()
plt.show()
First, load the tips dataset
tips = sns.load_dataset('tips')
tips.head()
| total_bill | tip | sex | smoker | day | time | size | |
|---|---|---|---|---|---|---|---|
| 0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 |
| 1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 |
| 2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 |
| 3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 |
| 4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 |
Let's explore this dataset. Pose a question and create a plot that support drawing answers for your question.
My question is:
# Generate a box plot to show the differences between females' and males' tips received
fig3, ax3 = plt.subplots(figsize=(15,9))
sns.boxplot(x='sex', y='tip', data=tips)
plt.show()
Redo the above exercises (challenges 2 & 3) with plotly express. Create diagrams which you can interact with.
Hints:
# Rearrange the stocks dataframe so to turn the columns of stock values into rows
# while keeping the dates as the row index
stocks_new = stocks.melt(id_vars=["date"],
var_name="stock",
value_name="stock value")
# Print the rearranged stocks data to see what the new dataframe looks like
print(stocks_new.head())
# Plot the stock values based on dates in a line plot
# Differentiate the lines by the stock names
fig4 = px.line(
stocks_new, x="date", y="stock value", color="stock",
hover_data=['stock']
)
fig4.show()
date stock stock value 0 2018-01-01 GOOG 1.000000 1 2018-01-08 GOOG 1.018172 2 2018-01-15 GOOG 1.032008 3 2018-01-22 GOOG 1.066783 4 2018-01-29 GOOG 1.008773
# Observe how the amount of tip changes with the total bill using a scatter plot
# Differentiate the scatters by sex
fig5 = px.scatter(
tips, x="total_bill", y="tip", color="sex",
hover_data=['sex']
)
fig5.show()
Recreate the barplot below that shows the population of different continents for the year 2007.
Hints:
#Load data
df = px.data.gapminder()
df.head()
| country | continent | year | lifeExp | pop | gdpPercap | iso_alpha | iso_num | |
|---|---|---|---|---|---|---|---|---|
| 0 | Afghanistan | Asia | 1952 | 28.801 | 8425333 | 779.445314 | AFG | 4 |
| 1 | Afghanistan | Asia | 1957 | 30.332 | 9240934 | 820.853030 | AFG | 4 |
| 2 | Afghanistan | Asia | 1962 | 31.997 | 10267083 | 853.100710 | AFG | 4 |
| 3 | Afghanistan | Asia | 1967 | 34.020 | 11537966 | 836.197138 | AFG | 4 |
| 4 | Afghanistan | Asia | 1972 | 36.088 | 13079460 | 739.981106 | AFG | 4 |
# Group by the population data by continent
# Calculate the sum of each continent's population
df_2007 = df.query('year==2007')
df_2007_new = df_2007.groupby(df_2007['continent']).sum()
print(df_2007_new.head())
# Plot the population data of continents in an ascending order in a barplot
# Differentiate the bars by continent
fig6 = px.bar(df_2007_new, x="pop", y=df_2007_new.index, color=df_2007_new.index, text="pop", orientation='h')
fig6.update_yaxes(categoryorder="max descending")
fig6.show()
year lifeExp pop gdpPercap iso_num continent Africa 104364 2849.914 929539692 160629.695446 23859 Americas 50175 1840.203 898871184 275075.790634 9843 Asia 66231 2334.040 3811953827 411609.886714 13354 Europe 60210 2329.458 586098529 751634.449078 12829 Oceania 4014 161.439 24549947 59620.376550 590